首页> 外文OA文献 >Using Noun Phrases for Navigating Biomedical Literature on Pubmed: How Many Updates Are We Losing Track of?
【2h】

Using Noun Phrases for Navigating Biomedical Literature on Pubmed: How Many Updates Are We Losing Track of?

机译:使用名词短语浏览Pubmed上的生物医学文献:我们失去了多少更新信息?

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Author-supplied citations are a fraction of the related literature for a paper. The “related citations” on PubMed is typically dozens or hundreds of results long, and does not offer hints why these results are related. Using noun phrases derived from the sentences of the paper, we show it is possible to more transparently navigate to PubMed updates through search terms that can associate a paper with its citations. The algorithm to generate these search terms involved automatically extracting noun phrases from the paper using natural language processing tools, and ranking them by the number of occurrences in the paper compared to the number of occurrences on the web. We define search queries having at least one instance of overlap between the author-supplied citations of the paper and the top 20 search results as citation validated (CV). When the overlapping citations were written by same authors as the paper itself, we define it as CV-S and different authors is defined as CV-D. For a systematic sample of 883 papers on PubMed Central, at least one of the search terms for 86% of the papers is CV-D versus 65% for the top 20 PubMed “related citations.” We hypothesize these quantities computed for the 20 million papers on PubMed to differ within 5% of these percentages. Averaged across all 883 papers, 5 search terms are CV-D, and 10 search terms are CV-S, and 6 unique citations validate these searches. Potentially related literature uncovered by citation-validated searches (either CV-S or CV-D) are on the order of ten per paper – many more if the remaining searches that are not citation-validated are taken into account. The significance and relationship of each search result to the paper can only be vetted and explained by a researcher with knowledge of or interest in that paper.
机译:作者提供的引用只是一篇论文的相关文献的一部分。 PubMed上的“相关引用”通常长达数十或数百个结果,并且没有提供暗示这些结果为何相关的提示。使用从论文句子中得出的名词短语,我们表明可以通过可以将论文与其引用相关联的搜索词更透明地导航到PubMed更新。生成这些搜索词的算法包括使用自然语言处理工具从论文中自动提取名词短语,并根据论文中出现的次数与网络上出现的次数进行比较来对它们进行排名。我们定义的搜索查询在作者提供的论文引文与被引证有效(CV)的前20个搜索结果之间至少有一个重叠实例。当重叠的引文由论文的同一位作者撰写时,我们将其定义为CV-S,不同的作者定义为CV-D。对于PubMed Central上的883篇论文的系统样本,至少86%的论文的搜索词之一是CV-D,而前20种PubMed“相关引文”的搜索词为65%。我们假设为PubMed上的2,000万篇论文计算的数量差异在这些百分比的5%之内。在所有883篇论文中平均,有5个搜索词为CV-D,有10个搜索词为CV-S,并且有6个独特的引文验证了这些搜索。通过引文验证的搜索(CV-S或CV-D)发现的潜在相关文献每篇大约为十篇-如果将其余未经引文验证的搜索纳入考虑,则更多。每个搜索结果与论文的重要性和关系只能由对该论文有知识或有兴趣的研究人员进行审查和解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号